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•^T ' Abstract 

\Q ' In a companion paper (McRobie(2013) arXiv:1304.3918 1, a simple set of 'elemental' estimators was 

presented for the Generalized Pareto tail parameter. Each elemental estimator: involves only three log- 
spacings; is absolutely unbiased for all values of the tail parameter; is location- and scale-invariant; and 
is valid for all sample sizes N, even as small as N = 3. It was suggested that linear combinations of 
such elementals could then be used to construct efficient unbiased estimators. In this paper, the analogous 
mathematical approach is taken to the Generalised Extreme Value (GEV) distribution. The resulting 






<**j ' elemental estimators, although not absolutely unbiased, are found to have very small bias, and may thus 
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provide a useful basis for the construction of efficient estimators. 



1 Introduction 



Together with the Generalized Pareto Distribution (GPD), the Generalized Extreme Value (GEV) distribu- 
tion is central in extreme value theory. Each distribution has three parameters (/i, cr, £) corresponding to 
location, scale and tail (or sh ape) respe c tively , and the estimation of the tail parameter is a problem that 
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has received much attention. McRobieJ (12013b presented an elegant set of 'elemental' estimators for the 
GPD tail parameter. These estimators: involved three log-spacings; are location- and scale-invariant; and 
are absolutely unbiased for all tail parameters -oo < £ < oo and all sample sizes N > 3. The idea was that 
linear combinations of such elemental estimators could then be constructed which would be efficient, whilst 
preserving the desirable properties of lack of bias and small sample validity. A variety of linear combina- 
tions were considered, and although consistency proofs have yet to be constructed, numerical evidence was 
presented which suggested that, for distributions within the GPD family, the root mean square error for at 
least one simple linear combination converged in proportion to 1 / y/N for all £. 

The idea of this paper is to construct the equivalent elemental estimators for the GEV, using the same 
mathematical approach as was adopted for the GPD. Here we find that the resulting elementals, rather than 
being absolutely unbiased, have a very small bias, typically an order of magnitude smaller than the root 
mean square error. As such, they may thus provide a useful basis within which linear combinations may 
provide efficient estimators. 

In both the GPD and GEV cases, the ultimate aim is of course the application to the wider domain of 
attraction case, using appropriately chosen maxima from any distribution, rather than from the pure GPD 
or GEV families. That extension is left for later consideration. Here, only the results for the pure GEV case 
are presented. 

The elemental estimators are illustrated in Figure [1] First, any sample from a GEV is ordered (with X\ 
the data maximum and Xjy the data minimum). An elemental is then constructed from three log-spacings 
of the order statistics. The construction takes any pair of non-adjacent order statistics X; and Xj, and 
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Figure 1 : The elemental construction for the GEV case is identical to that for the GPD case, except that 
different log-spacing weights are used. 

involves the log-spacing \og{Xi - Xj) (J > 1 + 2) together with two smaller log-spacings log(X/ + i - Xj) and 
\og{Xi - Xj-\) that nestle within it. Location- and scale-invariance is ensured by using only the statistics 
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The resulting elemental estimator takes the form 

iuN = a N (J) log t - b N (J) log t 
where, as will be shown, the weights a^(J) and ^^(7) are given by 
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These enjoy the property that b^K — 1) = cin{K). For the GPD case, the weights did not depend upon the 
sample size N, but for the GEV case, they do. 

The Generalized Extreme Va lue Distribution arises as the limiting distribution of block maxima (see for 
example Embrechts et al.1 (I1999I) V It has distribution function: 



F ( x) = { eXP (- (1+ ^) f ° r ^° where z = fZ£ 
[ exp(-exp(-z)) for <f = <x 



(5) 



The parameters fi and £, can take any value on the real line, whilst <x can be any positive value. For GEVs 
with positive £, the support (ii - cr/g < x) is bounded below but unbounded at the right. For £ negative, the 
support is unbounded below but bounded above (x < fi - cr/^). Since the right tail is the tail of interest, 
positive £ corresponds to long- or heavy-tailed distributions which are unbounded above, and negative £ 
corresponds to "bounded-above" distributions which have a finite right end point. 



2 Elemental Estimators for the GEV 

The cons t ruction of the elemental estimators for the GEV follows that for the GPD case presented in 
McRobid d2013l) . The detailed construction is given in Appendix 1 here. In outline, a sample of size Af 



is drawn from a member of the pure GEV family. The sample is then ordered, with X\ the sample maxi- 
mum and Xn the sample minimum. The location- and scale-invariant statistics r and t are then constructed, 
these being ratios of data spacings as defined in Fig.Q]and EqnQ] 

The expected values of log r and log t are then determined for the two cases where £, is positive or 
negative. These integrals are expressed via the Probability Integral Transform as integrals of the uniformly- 
distributed distribution function F(X) over the A^-dimensional unit simplex corresponding to the ordered 
sample. 

These integrals readily decompose into a simple part (of the form £(log(- log F))) and a complicated 
part (of the form (log(l - <f>^)))- The simple part leads to a term proportional to £, and forms the core of 
the estimator. For the GPD, the elemental combination of (log t) and (log t) whose simple terms deliver 
the estimate f has the pleasing property that the complicated terms cancel exactly at all £ (such that the 
complicated integrals do not need to be evaluated explicitly). However, in the GEV case, the elemental 
combination whose simple parts deliver the estimate £ does not cancel the complicated parts. However, as 
will be demonstrated numerically, the residual error in this non-cancellation is typically much smaller than 
the variability. 



Expected values of log % and - log 1 for N = 3 



Expected values of a (3) log r and -off ) log t 




Expected error, (^(3) log x -b 3 (1) log t - 5) 



0.08 






0.06 






0.04 
0.02 




s 
x 









A A 




0.02 


WMa^. Ai^y J \ 


■2 


0.04 


v v^v^f^^ I 


«" 


0.06 






0.08 







-4-3-2-1 1 2 3 4 5 

tail parameter ^ 




// 



Figure 2: The elemental construction for N - 3. The first figure shows how the expected values of log t and 
- log t have approximately hyperbolic form, and the second figure shows how scaling these by a^{J) and 
bfiil) makes each asymptotic to the diagonal for large |£|. Their sum then lies very close to the diagonal, 
but not exactly. The third figure shows the small bias, and the fourth shows the standard deviation of the 

estimates. 



Although the construction thus far has been developed by simply mimicking the procedure adopted for 
the GPD elementals, Fig. [2] shows how the construction is actually a natural one for the GEV per se. For 



N - 3, the first figure shows how the expected values of log r and - log t have approximately hyperbolic 
form, asymptoting to zero at one end and to some linear slope at the other. The second figure shows 
how scaling these by apf(J) - «3(3) = -l/(31og(3/4)) and bm{I) = £>3(1) = -l/(31og(2/3)) makes each 
asymptotic to the diagonal for large |£|. The elemental estimator is the summation, and this lies very close to 
the diagonal as desired, but not exactly so. It is asymptotic to the diagonal for large |£| (as may be expected 
since the expected values of the complicated terms log(l - $*) tend to log 1=0 for |£| large), and the 
small but non-zero bias for intermediate values of |£| is shown in the third figure. The standard deviations 
of the estimator are shown in the fourth figure, from whence it can be seen that, at worst (near £ = 0), the 
bias is some 50 times smaller than the standard deviation. These figures were produced numerically, using 
250,000 samples of size N - 3 at each value of £. Although the figures correspond to the single elemental 
available at N = 3, similar figures apply for any elemental at any N. 

In summary, the elementals for the GEV are constructed by choosing weights on(J) and bn(I) which 
guarantee lack of bias at large |£| . As stated, one hopes that the bias at any intermediate value of £ is small, 
and happily it is typically much smaller than the variability there (and even if larger intermediate deviations 
were to occur for larger N, it is clear that linear combinations of elementals could be constructed such as to 
manage the overall deviation). 

3 Evaluation of the Coefficients 

Although the elemental coefficients (Zn(J) and bn(I) are given by a comparatively simple summation (Eqns.|4] 
and |4]i involving binomial coefficients and logarithms, the evaluation becomes fraught with numerical er- 
rors for N > 25 in 32 bit precision computation. The numerical oscillations are illustrated in Fig. [3] That 
this is a numerical instability rather than true behaviour is noted from the fact that if one makes less effort 
to calculate the factorials carefully then the instability occurs at lower N. Even when care is taken, and 
the binomial coefficients are calculated via the standard procedure of using logarithms of gamma functions, 
there is only marginal improvement in numerical performance. The problem arises because the summations 
in Eqns.|3]and|4]sum close to zero, but the individual elements in the sums - the logarithms weighted by 
the binomial coefficients - may be large. Repeated addition and subtraction of these, via the (-1)'" factor, 
means that small rounding errors accumulate and eventually dominate. 

Before describing how these numerical instabilities can be resolved, we note that the coefficients aj?(J) 
follow trivially from the bs(J) coefficients using the relation «#(•/) = bpf(J - 1) (with the nuance that b^{I) 
must also be evaluated at the meaningless value I = N — 1 in order to capture the coefficient ci^iN) which 
would not otherwise be computed). We thus proceed to describe only the calculation of the b^{I). 

We define /?#(/) = l/bff(I), such that/? is the weighted summation and its reciprocal b is the desired 
coefficient. It follows from Eqn. |4]that the coefficients /? can be computed using the recursion relation 

fa(! + 1) = tty^v-iW - ttp 6 ^ (9 



This is seeded by the 1=1 top row 
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Each /3 coefficient in row / + 1 is thus computed from a weighted sum of two terms above it in row /, much 
akin to a standard Pascal's Triangle construction. This elegant construction avoids all need for explicit 
computation of factorials or logarithms of gamma functions . However, even this algorithm does not remove 
the numerical inaccuracies that arise for N > 25, thus an additional approach is required for large N. 

It can be readily seen that, to first order (ignoring terms of order 1 IN), the recursion formula reduces to 

p N -i(fi = (1 - x)p N (I) + x/3 N (I + 1) (8) 



where x = I IN. This is simple linear interpolation amongst the three coefficients. To first order, then, the 
approximate recursion formula Eqn. |8]states that the three values of /3 lie on a straight line when plotted as 
a function of x. This suggests that there is some underlying curve to which the /? values are converging. 
Numerical evidence presented in Fig. [3]b) suggests that 
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(l-JE)log(l-Jt) N 



such that bn{I)IN converges to fix) - -(1 - x)log(l - x) as N increases. An analytical demonstration of 
this is presented in Appendix 2. 





Figure 3: Calculation of the elemental coefficients b^ij). The left hand diagram illustrates erroneous nu- 
merical oscillations from the recursion formula, clearly visible for N - 35. The right hand diagram shows 
the convergence of b^{I)IN (shown as dots for N - 3 to 30) to f(x) = -(1 - x) log(l - x) (shown solid), 
where x - I IN. 



Further numerical analysis suggests that, for finite N, there is a better approximation 
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The numerical evidence for this is shown in Fig. |4]where, for N up to 30, the actual values of b^(J)IN are 
plotted as points and the approximations are plotted using small circular markers. In all cases, even for JV = 
3, each actual value lies well within the corresponding circular marker. The relative error (bT pwx - b^/b^ 
is shown in Fig. [5] showing convergence at any / and a better than 1 % accuracy for all N < 25 (and in most 
cases shown, the accuracy is considerably better). 

In summary, for Af < 25, the b^il) are computed by the recursion formula Eqn. |6] and for A^ > 25 we 
use the approximation for b^il) contained within Eqn. [10] The corresponding values of cln{J) are computed 
from the relation that cjn(K) — b^jK — 1). 

For the GPD case iMcRobid (120 1 31) . the elemental coefficients may be succinctly expressed as 



„GPD 



(J) = J-l and b% PD (l) = / 



(11) 



Unlike the GEV case, these are independent of N. Fig. |5p) shows that the GEV coefficients converge 
towards the GPD weightings for the extreme tails (i.e. for x small), much as may be expected. 
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Figure 4: The approximation of btf(T)/N by Equation[lO] The bN(T)/N values are plotted as points and the 
approximations are plotted using small circular markers. 
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Figure 5: Left: the relative error involved in the approximation of bf4(I)IN by Equation [TOl 
comparison of the elemental coefficients b^il) of the GEV with those of the GPD. 



Right: a 



4 Performance of the elementals 

Fig. |6^) shows the performance (mean and mean ± std. dev) of each of the fifteen elemental estimators 
available at N = 7 when applied to samples drawn from a pure GEV. It can be seen that each elemental 
is very close to being unbiased. It can also be seen that some standard deviations are large. The biases 
are plotted in Fig. [6J5). Also plotted in each of these figures is the performance of the unit sum linear 
combination which gives equal weight to each of the fifteen elementals. As may be expected, the linear 
combination - like the elementals - is almost unbiased, and generally has a considerably smaller standard 
deviation than the individual elementals. 



1 5 elementals and the unit combination, N = 7 
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Figure 6: The left hand diagram shows the performance (mean, and mean + standard deviation) of each of 
the fifteen elementals available for samples of size N — 1. Also shown (circles) is the performance of the 
unit sum linear combination which gives equal weight to each elemental. The right hand diagram shows 
the small bias of the elementals and the linear combination. Plots based on 500,000 samples at each £. 

The question of which linear combination is in some sense optimal is addressed in Fig. Q where seven 
different linear combinations of the individual elementals are considered. The weights in each linear com- 
bination have unit sum. The first combination gives equal weight to each elemental. The next three com- 
binations resemble linear triangular basis function which - prior to normalisation - have unit weight at one 
corner of the upper triangular matrix of possible / and J values, and zero weight at the other two corners. 
The weights are thus proportional to N - J + 1, J - I - I and / respectively. The final three combinations 
are sums of two of the previous three, such that - pre-normalisation - they have unit weight at two corners 
and zero weight at the remaining comer. There is nothing special about this choice of combinations, other 
than that the choice serves as a useful starting point. 



Performance of various linear combinations. N = 7 



Efficiencies relative to the unit weight combinatio 





Figure 7: The left hand diagram shows the performance (mean (solid), and mean + standard deviation 
(dashed)) of seven different linear combinations of the fifteen elementals available for samples of size 
N — 1. The equal weight combination is marked with + signs. The right hand diagram shows the relative 
efficiency (ratio of root mean square errors) of each combination to the equal weight combination (denoted 
N). Plots based on 100,000 samples at each £. 



Fig-lit) shows that the combinations perform similarly. By plotting the ratio of root mean square errors, 



Fig- Up) makes clearer the small differences in performance showing the efficiency of each combination 
relative to the equal weight combination. It can be seen that some combinations perform better in some 
parameter regimes, and that no combination is optimal for all £. Whilst one could consider constructing 
some form of adaptive estimator - a first estimate of £ being used to select the linear combination that 
is efficient in that region - at this stage of the analysis, we simply proceed with any convenient choice 
of combination, such as that with equal weights, or - to mirror the choice used in the GPD case - the 
combination whose weights are proportional to N - J + I. 

5 Consistency 

No attempt is made here at providing an analytical proof of consistency for any combination. One reason 
for this is that small and moderately-sized data sets are often of practical interest, and the usefulness of a 
consistency proof requiring very large sample sizes is then debatable. 

Instead, numerical evidence is provided to demonstrate that root mean square errors of some standard 
linear combinations appear to converge sensibly as sample sizes increase (with samples drawn from pure 
GEVs). Fig. [8^) shows how root mean square errors decrease at a selection of parameter values ^ as sample 
size grows from N - 3 to 30, for the combination whose weights are proportional to N - J + 1 . This 
evidence is extended in Fig. [8}?) to samples of size N = 1000. The apparent linearity of the graphs on the 
chosen axes suggests that root mean square error decays as 1 / "V^V. 
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Figure 8: The left hand diagram shows the performance (mean, and mean ± standard deviation) of the 
elemental combination with weights proportional to N - J + 1, for sample sizes from N = 3 to 30. Means 
are shown (+) with deviations dashed. The right hand diagram extends this to samples sizes up to N = 1000. 
Cases with £, negative are shown (+) and those with £ positive are shown (o). The y-axis is the root mean 
square error (based on 10,000 samples) and the x-axis plots 1 - yZJN. The tendency towards linearity 
under increasing N at each ^ suggests that there is consistency, with convergence at a rate proportional to 

i/Vv. 



6 Comparison with Maximum Likelihood 



Maximum Likelihood Estimation (MLE) is a standard tool for parameter estimation, and Embrechts et al. 



(119991) state that the numerical calculation of ML estimates for the GEV "poses no serious problem in 
principle". Even with their italicised qualification, this is perhaps optimistic. For example, it is well- known 
that MLE applied to the GEV requires regularity conditions that do not hold when £ <= -1/2 (ISmith 



dl987l) ). Moreover, for any data set, there are always parameter sets wherein the likelihood is arbitrarily 
large (i.e. the likelihood is infinite everywhere on the boundary surface 1 + ${x max — jj)/ct = for f < — 1, 
and is arbitrarily large in open neighbourhoods adjacent to this). ICastillo and Daoudil (120091) also show that 
there are occasions where the likelihood function for the two-parameter GPD has no local maximum. 

Fig.|9]shows the estimates obtained via Maximum Likelihood and via the equally-weighted combination 
of elementals for 10,000 samples of size N - 7 drawn from a GEV. The parameters are (//, <x) = (0, 1) and 
£, is uniform random over [-10, 1 01. The Max i mum L ikelihood algorithm used is the gevfit function within 
the Statistics Toolbox of Matlab ( IMathWorksl d2012l) ). The iterative ML analysis took considerably longer 
to run than the straight-forward elemental evaluation, and the unusual pattern of the ML results suggests 
that the algorithm experienced some numerical difficulties. The elemental results, on the other hand, are as 
one may expect - showing unstructured scatter about a diagonal mean. 



equal weight combination, N = 7 





Figure 9: Each diagram shows 10,000 samples of size N - 7, with (fi, cr) = (0, 1) and £ uniform over 
[-10, 10]. To the left are the resulting estimates obtained using Matlab gevfit, and to the right using the 
equal weight combination of elementals. 



Statements about the efficacy of ML which are predicated on the parameters lying within some range 
- such as the regularity condition requiring £ < -1/2 - are somewhat unhelpful, given that the analyst 
typically does not know the parameters, but has only data. As an example, consider an ordered data set 
consisting of three points [-l,x,„,y, 1] where x,,,^ is some value between -1 and 1. Such data could have 
originated from a GEV of any £, thus questions such as whether £ < - 1 /2 are not well posed when given 
only the data. As x m id is varied, the estimates of £ using the gevfit ML algorithm and the (unique) elemental 
combination are shown in Fig. [10] Again, the irregular graph of the ML estimates is indicative of numerical 
difficulties, and there were indeed numerous error messages warning of convergence to boundary points or 
of failure to converge within some pre-designated number of iterations. By contrast, the elemental estimator 
produces a smooth, well-behaved graph whose computation requires no iteration, being a simple one-step 
evaluation of the function f = a log t - b log t with a - 1/(3 log 4/3) and b - 1/(3 log 3/2). Moreover, the 
resulting graph of the elemental estimate conforms reasonably with intuition, in that the centrally-located 
Xmid - leads to the estimate £ = -(log 2 2)/(3 log(3/4) log(2/3)) = -0.23, which is not very far from the 
uniform distribution case of £, - - 1 . (Indeed, there is no reason that the estimate for this uniformly-spaced 
data case should deliver the exact value f = — 1, but some loose agreement is to be expected.) Values of 
Xmid to the right of this correspond to increasingly negative estimates of £, which correspond to the data 
tending to cluster towards the rightmost end point. Similarly, values of x m u further left of centre lead to 
increasingly positive values of £, in correspondence with the tendency of positive £ distributions to cluster 
the data to the left end-point. If one assumes that the elemental estimate is in some sense sensible, then it 
is perhaps surprising to observe that the ML estimate agrees closely with this for large negative values of £, 
the very parameter regime that regularity considerations would have led us to be wary of. 
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Figure 10: The estimates obtained using Matlab's gevfit ML estimator (solid) and the unique elemental 
(dashed) for the sample [-l,x m td, 1], as x m u is varied. 

As further demonstration of the robustness of the elemental approach as compared with the ML ap- 
proach, the estimator performance is compared over a number of idealised data sets. To create an idealised 
sample of size N, the unit interval is divided into N equal segments, the midpoints Fj of which are used, via 
the Probability Integral Transform, to create a sample {x, | x, = F~ l (Fj), i = l,.,.,N] where F is a GEV 
distribution function with parameters (fi, cr, £) with /j. = 0, cr = 1, and £, varying over the interval [-10, 10]. 
(The ordering of i is such that x\ is the data maximum). Although each data set employs a nominal value of 
£, in its construction, any such data set could have arisen by random sampling from a GEV with any other 
value of £. The value of £ used in the idealised construction is thus only nominally associated with that data 
set. 

Fig.QT]compares the resulting gevfit ML estimates with those obtained from the equal weight elemental 
combination, for N — 3, 7, 15 and 3 1 . Again, the irregularities of the ML graphs are indicative of numerical 
difficulties. That these irregularities are present even for N - 31 shows that they are not merely a feature 
of small sample sizes. The elemental estimates however, are well-behaved for all samples, and lie close to 
the diagonal in all cases. Indeed, given the artificiality of the idealised data construction, although some 
approximate diagonal correspondence is to be expected, perfect diagonality is not - especially for Af small. 

In summary, linear combinations of elemental estimators provide computationally efficient and robust 
estimators for the tail parameter, and - even for extreme parameter values (|£| * 10) and very small sample 
sizes (N ^ 3) - do not suffer the numerical difficulties experienced by the ML estimator. 

7 Application to distributions related to Weibull distributions 

If the density function of a GEV is flipped right to left, we obtain a three-parameter distribution related 
to the Weibull distribution. It follows immediately that the shape parameter of a sample drawn from such 
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GEV, evenly spaced F(x), N=3 



GEV, evenly spaced F(x), N-7 





Figure 11: The estimates obtained using the Matlab gevfit ML estimator (solid) and the equal weight com- 
bination of elementals (dashed) for the idealised samples created by uniformly-spaced F, for N = 3,7, 15 
and 31. 

a distribution can be estimated by simply reversing the elemental coefficients. That is, given a sample 
(ordered such that X\ is maximum) drawn from the three parameter (//, cr, £ ) distribution with distribution 
function 



F(l) Jl-exp(-d-^) for^O where ^^f 
[ 1 - exp (- exp (z)) for £ = cr 

an elemental estimator of the parameter £ can be obtained via 

2W = a£G/)logT-&£(7)log/ 
with t and t defined as per Eqn.Q~|and 



(12) 



(13) 



<(/) 



-b N (N +1-7) and bZ(J) = -a N (N +1-1) 



(14) 



For large N, b™(I) converges to Ilog(I/N). 

The practical significance of all this is that we are often interested in heavy-tailed distributions which 
head off unbounded to the right, such as the GEV with positive £. By flipping the GEV right to left, we 
obtain another family of heavy-right-tailed distributions (and these are related to the Weibull distribution). 
These heavy right tails are those that, before flipping, headed off to the left in the GEV for £ negative. 
These tails have a different structure to the positive £ GEV tails, and we now have estimators for the shape 
parameter of this new class of heavy tails. However the shape parameter ( of these new tails should not 
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be confused with their extreme value tail index, the tail parameter £ of the GEV within whose domain of 
attraction this Weibull-related distribution lies. That tail index is 0, since the Weibull-related distribution 



lies w ithin the domain of attraction of the Gumbel distribution, the GEV with £ = dEmbrechts et al 
(fl999k 



A similar reflection could be applied to the GPD estimators of McRobie (2013) but this is of arguably 
lesser interest, given that all left tails in the GPD are bounded below, and thus when flipped, lead to right 
tails that are bounded above. 

8 Summary 

Elemental estimators have been presented for the GEV tail parameter £. These have small bias, are com- 
putationally simple and robust, are applicable to small data sets and are valid for all parameters £, positive, 
zero and negative. Linear combinations of elementals appear to provide efficient, consistent estimators 
when applied to data drawn from pure GEVs, and such combinations appear to have advantages over Max- 
imum Likelihood approaches to the GEV. Further, by a simple reflection, the approach has been shown to 
be applicable to another class of heavy-tailed distributions related to the Weibull distribution. 
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Appendix 1. Derivation of the elemental coefficients 

The Generalised Extreme Value distribution has distribution function 

[ exp [-(1 + ^r 1 ^ 1 for £ * 
F(x) = \ F[ r * *' } * (15) 

[ eX p[-exp(-(^))j for£ = 

The support is [i - cr/t; < x for £ positive and x < \x - <x/^ for £ negative. For now, we ignore the ^ = case. 
Inverting the distribution function via the Probability Integral Transform gives 

x = u(F) = M + j [(- log F)-*-l] (16) 

Let X be an ordered sample of size N, (X^ < X^-i < . . . < X2 < X\). The expected value of any 
function h(X) is then 

(h(X)) = N\ dFi... I dF N h(u(F)) (17) 
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the integral being over the A^-dimensional unit simplex defined by the ordering. 

For any function of just two data points X, and Xj, (i < j), all other variables can be integrated out to 
leave 



where 



(h(X i> Xj)) = c ijN 



CijN ~ 



f dF t f ' dFj fijNiFuF^hiuiFdMFj)) 

Jo Jo 

and f ijN (x, y) = (l-x) i - 1 (x- y) H - x f~ J 



(18) 



(19) 



(i-l)\{j-i-l)\(N-jy. 

The expected value of any function h(X k ) of a single data point, is likewise, after a further integration, 
given by 



where 



(h(X k )) = d m 



f 

Jo 



dF k g kn {F k )h(u(F k )) 



IkN 



(k-iy.(N-k)\ 
Expressed in terms of F's, a log-spacing is given by 

I 
log(X i -X j ) = 



and g kN {x)^x N - k (\-x) k - 1 



log(f) + 7 log(- logF,) + log 



log(f ) - ylog(-logF,) + log 



_/-log£ L \ 

/ -logF/ V 
\-logFj) 



for i; — -y, y > 
for £, — +y, y > 



(20) 
(21) 

(22) 



Already, the strategy is becoming evident. The zero sum of log-spacing weights in each elemental will 
remove the log(<x/y) terms, and expectation of the log(- log F) terms will provide some constant multiple 
of y that will form the core of the estimator. One hopes that, as in the derivation for the GPD elementals, the 
remaining complicated terms log 1 - ((- log F,-)/(— log Fjyj can be combined in such a way as to vanish. 
Unfortunately this is not the case, although - as will be shown - the residual error is small. 



We shall seek an estimator of the form 

£ IJN - A log t - B log t 

with t and t defined as in Eqn. Q] 

For £ negative (£ = —y, y positive), we obtain almost immediately 

/i \ /i (Xi+i ~~ XjY 
<logr)_ = (log |— — > 



(23) 



X, - X, 



= <lo£ 



1 - 



■ log F I+ 
- log Fj 



> - dog 



1 - 



- log F, 



I dx I dy[c MJN f I+L 
Jo Jo 



logFyy 
JN (x,y) - c IJN fi JN {x,y)] log 1 1 



-log-y 
-logy 



3 /, 



(24) 



Essentially there are only the complicated terms, denoted as I\ . 
Similarly, continuing with £, negative, we obtain 



<logr>_ 



do 



Xr-X 



./-I 



Xr-Xj 
y(log(- log F/_i)> - y<log(- log Fj)) 



+<log 



1 



- log Fi 
■logFj_i 



> - <lo£ 



1 



logF- xr 



lOg Fj 
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As earlier, the complicated terms can be collected into a single integral, by defining 

h = I dx j dy[cij-\jffi,j-isr(x,y)-cij N f IJN (x,y)]log 1 - I— : (25) 

The leading log(- log F) terms are each a function of a single data point, and thus via Eqn. [21] 



log t>_ = r I [rfy-i,^/-i,jv(x) - d JN g JN (x)] log(- log x) rfx + / 2 

Jo 



Denoting the term in square brackets as \DG(xj], we have 

\dg] = — x N ~ M a - xY~ 2 — ^- ] a - xY- 1 

1 OJ (j-2y.(N-j + iy u x) (j-iy.(N-jy ( } 

- [(J - l)x N - J+1 (l - x) J - 2 -(N-J+ 1)^(1 - x) J - 1 ] 



(J - l)l(N - J + I) 
Each of the (1 - x) p terms can be expanded using the Binomial Theorem to obtain 

[DG] J " ) (S % w . - ^! (-iy^- - S (W -// ', )(J > H P7) 

i j-i I \*—i m'.(J — 2 — my. *—( m'.(J—l—my 

In the first summation, we relabel the dummy summation index m using k — m+\ , and then relabel kasm 
(i.e. m becomes m— 1) giving 

= / n \ f « (/-!)(/ -2)! m _ 1 ^ y+ „, _ £ (iy-y + i)(y-i)! J 



\,m=l m=0 

Collecting terms for each m - l to / - l , we obtain 



( ,., ) U - / + »j» - gc-ir^- ^^-'l), cv - j * i + J 



Noting that the leading — (AT — J + l)x N J term is that which would be obtained if the summation were 
extended down to m = 0, we obtain 

[£>G] = -( /i Jfe-If-^W y ,;/ W-/+l+m)| (29) 

We now need only multiply by log(- log x) and integrate over the unit interval, and use the result that 



!> 



, - log a C 

_l log(- log x) dx = (30) 

a a 



where C is Euler's constant. 

The terms involving Euler's constant are an alternating sum of binomial coefficients and thus sum to 
zero: 

'ts-vrl'- 1 ) = o (3D 

m=0 x ' 

This leaves only the -(log a) /a terms (where a = N— J+1 + m), which give 

J [DG(*)]log(-log*)«te = ( J N l )| £(-!)*( J m 1 )log(iV-/+l+m)l = ^ (32) 
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Collecting all the above, we obtain finally that for £ negative, 

<logr>_ = ^+/ 2 = !+/ 2 (33) 

We now need to repeat the procedure for £ positive. Writing £ = y, y > 0, we readily obtain that 

i _ ( -*** y 
T = i t-***y (34) 

whose expected logarithm was defined in Eqn.l25las F, thus 

<logT> + = / 2 (35) 

Similarly for (log t)+, we obtain an I\ contribution from the complicated integrals, and the log(— log F) 
terms are 

y [<log(- log F,)) - <log(- log F I+l ))] (36) 

These are identical to the terms in the (logr)_ derivation, involving integrations over a single variable, 
except we must make the substitution / - 1 goes to I. We thus obtain 

(log*> + = ^+/i (37) 



with 



l - = J [...]log(-log(x)rfx = -( N j )|X(-ir( I \log(N-I + m)\ (38) 



B 
In summary, 

<logf>- = h 

<logr>_ = |+/ 2 

<log/> + = ^+/i 

(log r>+ = 7 2 

(39) 

whence, for positive or negative £, we have 

A<logT>-fl<logf> =^ + A7 2 -B/i (40) 

This is an unbiased estimator for £ provided that the we ighted sum of the complicated terms A/ 2 - BI\ - 0. 



Although the equivalent terms in the GPD derivation (IMcRobia (12013b ) cancelled exactly, this is not the 



case for the GEV here. However the resulting bias is small. It also vanishes as |£| becomes large. 
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Appendix 2. Approximation of the b N (I) coefficients 



Writing 



N 



[ N I ) S i with s ' = - E ( m ) (_1) '" l0g{N ~ I + m) (41) 



and /(x) = -(1 — x)log(l - x) with jt = — 



we show that 



b N {T) 
N 



/(*) 



x / 1 



(42) 



(43) 



The demonstration begins by writing the ratio of the approximation Nf(x) to the true value of the 
coefficient b^{I) as 



b N (I)/N 



N 



N 



Si fix) 



(44) 



and we expand each of the three terms on the right-hand side as a power series in 1 /N, keeping the first 
three nonzero terms in each expansion. 



Term 1: 



N 



N 



N.N(N - l)(N - 2) . . . (iV - 7 + 1) 



7! 



7! I N \ N 



7-1 

N 



N- 



/+! 



7! 



/(/-I) | /(/-l)(/-2)(3/-l) | / 1 

2N 24N 2 \N 3 



(45) 



Term 2: 



Sl = M'V-irwr-i+m) = (-i) /+1 Z(n(-i) fe iog(i-| 

m=0 v 7 t=0 v ' \ ' 



(46) 



*=<) 



We now use the relations 



(-1) 7 £(-1) 



/(=o 



I ' .# 



giving 



for 7 = 1,... ,7-1 

7! for j = I 

[7(7 + l)/2] /! for 7 = 7 + 1 

[/(/ + !)(/ + 2)(37 + l)/24] 7! for 7 = / + 2 



/! 7(7+1)7! 7(7+ 1)(7 + 2)(37+ 1)7! / 



(47) 



7JV 7 2(7 + 1)N I+1 
7! 



24(7 + 2)JV /+2 



JV /+ 



7JV 7 



i + il + 7V+l)(37+l) /I 

2JV 24N 2 I A 73 



(48) 



This result is worthy of comment. Essentially we have expanded each log( 1 - k/N) as a power series 
in k/N. When these series are weighted by the binomial coefficients and summed then - rather remarkably 
- the first 7-1 terms disappear. For example, if calculating b^(I) with 7 = 100, the first 99 terms in these 
series disappear, and the first non-zero terms are those involving (k/N) lQ0 . 
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Term 3: 



/(jc) = -(l-Jc)log(l-Jc) = -(l-x) 



-x- X -- X - + o(x<) 



2 3 

X X 3 



? fi \ J N 



N 



, l l J 1 

1 T+0 — r 

2JV 6JV 2 \JV 3 



(49) 



Combine terms 

Each of the three terms is now in the form c,(l +at/N + b,/N 2 + 0(1 /N 3 )) thus their product can be written 



m 

b N (I)/N 



N 



N 



Sif(x) = C 



1 + — + — 7+0 — r 

JV JV 2 \JV 3 



(50) 



with 



C = cic 2 C3 = 



yv 



7+1 



IN' \n) 



a\ + «2 + «3 



/! /\W/ 

-/(/ - 1) I 2 



2 + T=° 



and B = a\ai + «2fl3 + ^3^1 + b\ + £>2 + &3 



-/(/-I) 



-/ 

T 



-/ 

T 



-/(/-i) 



/(/- 1)(/- 2)(37- 1) 7 2 (/+l)(3/+l) I- 
+ 24 + 24 6" 



= — [(-6/ 4 + 6/ 3 ) + (-6/ 3 ) + (6/ 3 - 6/ 2 ) + (3/ 4 - 10/ 3 + 9/ 2 - 27) + (3/ 4 + 4/ 3 + I 2 ) + (-4/ 2 )] 

(51) 



24 
-/ 
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Finally, then 



b N (D/N 



i n 

UN 2 U 3 



or 



MO 



= -(l-x)log(l-*) 



1+ 12^ +0 ^ 



(52) 
(53) 



Thus, for 7 fixed, the coefficient ^(7) converges to — A/(l - x) log(l - jc) as N increases. 
The fore-going demonstration would be improved if the convergence were established at fixed x rather 
than fixed N. However, we note, without further proof, that 



b N (D 

N 



-(l-x)log(l-x)-— log(l-x) 



(54) 



is a better approximation and this is illustrated in Fig. [4] 

The following examples illustrate the somewhat unusual nature of this approximation. Consider the 
caseJV = 20, 7 = 3. 



^20(3) = 



-3.2.1 



1 



20.19.18 



lu &\18 3 .20/ 



-17 + 



12(20)/ e \20 



(55) 
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for which a hand-held calculator gives 2.76494701 « 2.76494671. The error is w 3 x 10~ 7 , which is 
substantially better than the (9(1 /W 3 ) suggested by the derivation. 

Although the approximation was developed for large N, it works surprisingly well even for the smallest 
possible N (namely N = 3), leading to somewhat unusual results such as 

MD= ~* - -[2+-^-)log(2/3) or 0.8221*0.8222 

31og(2/3) \ 12(3)/ 

b iW= -JT^\ x -f 1 + T^) lo 8 (1 / 3 ) or 1-1587*1.1596 

31og(^) \ 12(3)/ 

In summary, whilst it is conceptually simple to write down and evaluate the exact expression for b^il) 
in terms of binomially-weighted sums of logarithms, the numerical instabilities are such that the remarkably 
accurate approximation of Eqn. |54]provides a useful alternative. 
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